Components of a Quantitative Model of German Intonation

نویسنده

  • Bernd Möbius
چکیده

In this paper a quantitative description of German intonation is presented. It will be demonstrated that intonation contours can be efficiently analyzed, and predicted, by interpreting the components and parameters of Fujisaki’s model in terms of linguistic features and categories. It will also be argued that a superpositionally organized model is particularly suitable for a quantitatve description. STRUCTURE OF INTONATION Tone Sequences Or Layered Components? Two major classes of intonation models have evolved in the course of the last two decades. There are, on the one hand, hierarchically organized models which interpret F0 contours as a complex pattern resulting from the superposition of several components. Their counterparts are usually seen in the models which claim that F0 contours are generated from a sequence of phonologically distinctive tones, or categorially different pitch accents, that are locally determined and do not interact. Two quotations illustrate the competing points of view: “[...] the pitch movements associated with accented syllables are themselves what make up sentence intonation [...] there is no layer or component of intonation separate from accent: intonation consists of a sequence of accents, or, to put it more generally, a sequence of tonal elements.” ([9], p. 40) “[...] Standard Danish intonational phenomena are structured in a hierarchically organized system, where components of smaller temporal scope are superposed on components of larger temporal domain [...] These components are simultaneous, parametric, non-categorical and highly interacting in their actual production.” ([25], p. 2) Ladd [9, 10] argues that although the tone sequence and the superpositional models diverge in formal and notational terms, they nevertheless may be more similar from a descriptive point of view than usually admitted. Although I agree with the argument ([24], p. 1041) that the two types of intonation models not only differ in formal respect but from a conceptual point of view as well, I don’t think they are ultimately incompatible. As a matter of fact, in more recent publications (e.g., [11]) Ladd proposed a metrical approach that incorporates both linear and hierarchical elements. The main difference between the ’pure’ linear and overlaying models can be seen in how the relation between local movements and global trends in the intonation contour is defined, or, in other words, in the view of the relation between word accent and sentence intonation. The underlying problem is that wordand utterance(or phrase-)prosodic aspects all express themselves by one and the same acoustic variable: the variation of fundamental frequency as a function of time. There is no way of deciding either by acoustic measurements or by perceptual criteria whether F0 movements are caused by accentuation or by intonation. A separation of these effects, however, can be done on a linguistic, i.e. more abstract, level of description. Here rules can be formulated that predict accentor intonation-related patterns independent of, as well as in interaction with, each other. Autosegmental theory allows for the independence of various levels of suprasegmental description and their respective effects on the intonation contour by an appropriate phonological representation. According to Edwards and Beckman [2], the most promising principle of intonation models ought to be seen in the capability to determine the effects of each individual level, and of their interactions. Although probably not intended by the authors, this is precisely the most important argument in favor of a hierarchical approach and of superpositional models of intonation. Thus, the conceptual gap between the different theories of intonation does not seem to be too wide to be bridged. After presenting supporting data, I will continue this line of argument in the concluding section. Motivation For A Superpositional Approach Even among researchers representing different types of intonation models there is widespread agreement on the fact that the F0 contour of an utterance should be regarded as the complex result of effects exerted by a multitude of factors. Some of these factors are related to articulatory or segmental effects but others clearly have to be assigned to linguistic categories. In contradiction to the explicit assumption in [20] that intonation is determined exclusively on a local level, there is ample evidence for non-local factors. In a study of utterances containing parentheses [8], the authors show that the intonation contour is interrupted by the parenthesis, and resumed right afterwards in a way the contour would have looked like in the ’same’ utterance without parenthesis. Also, in [12] the authors explain how the first accent peak in an utterance is adjusted depending on the underlying syntactic constituent structure. Furthermore, there is some evidence that the speaker pre-plans the global aspects of the intonation contour, not only with respect to utterance-initial F0 values but to phrasing and inter-stress intervals as well [23]. These considerations obviously favor models that directly represent both global and local properties of intonation. These models also provide a way of extracting prosodic features related to the syntactic structure of the utterance and to sentence mode. Generally speaking, the analytical separation of all the potential factors considerably helps decide under which conditions and to what extent the concrete shape of a given F0 contour is determined by linguistic factors (including lexical tone), non-linguistic factors, such as, e.g., intrinsic and coarticulatory F0 variations, and speaker-dependent factors. Superpositionally organized models lend themselves to such a quantitative approach: Contours generated by such a model result from an additive superposition of components that are in principle orthogonal to, or independent of, each other. The components in turn can be related to certain linguistic or non-linguistic categories. Thus, the factors contributing to the variability of F0 contours can be investigated separately. In addition, the temporal course pertinent to each individual component can be computed independently. A production-oriented model providing components for accentuation on the one hand and sentence or phrase intonation on the other hand and generating the pertinent patterns by means of parametric commands appears to be particularly promising. The only approach exploiting the principle of superposition in a strictly mathematical sense is the model proposed by Fujisaki and co-workers (e.g., [5, 3, 4]). This particular model has several advantages. Since it satisfies the principle of superposition, the respective effect of a given factor can be determined for a predefined temporal segment or for a given linguistically or prosodically defined unit, such as a phrase or a stress group. For every desired point in time in the course of an utterance, the resulting F0 value can be computed. The values of the model parameters (see following section) are constant at least within one stress group. This data reduction can be an important aspect for certain applications like speech synthesis. The smooth contour resulting from the superposition of the model’s components is appropriate for the approximation of naturally produced F0 contours. Generally speaking, adequate models are expected to provide both predictive and explanatory elements [1]. In terms of prediction, models have to be as precise and quantitative as possible, ideally being mathematically formulated. A model provides explanations if it is capable of analyzing a complex system in such a way that both the effects of individual components and their combined results become apparent. Fujisaki’s model meets both requirements; and all effects can be described uniquely by their causes. The model does not, however, explain by itself why a given component behaves the way it does. The particular approach and the application presented in this paper aim at providing these explanations, especially by applying a linguistic interpretation of the model’s components. Another explanatory approach can be seen in the potential physiological foundalnF0 Phrase component Accent component F min Phrase control mechanism Accent control mechanism Ap

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A novel approach to the fully automatic extraction of Fujisaki model parameters

The generation of naturally-sounding F0 contours in TTS is important for the intellegibility and perceived naturalness of synthetic speech. In earlier works the author developed a linguistically motivated model of German intonation based on the quantitative Fujisaki model of the production process of F0. The extraction of parameters for this model from the extracted F0 contour, however, poses p...

متن کامل

The Prosody of Modern Hebrew - A Quantitative Study

The current paper presents a preliminary study of Modern Hebrew prosody using a quantitative model. In a small corpus of segmentally identical utterances the place of focus and sentence mode were systematically varied. Following the rationale of MFGI, the Mixdorff-Fujisaki Model of German intonation, the phonetic properties of the contrast lending intonemes were examined. These properties are r...

متن کامل

Learning the parameters of quantitative prosody models

The article introduces a novel hybrid data driven and rule based approach for the prosody control in a TTS system, which combines the advantages of well-balanced, quantitative models with the flexible training of derived model parameters. Instancing the training of Fujisaki intonation parameters for German (MFGI) the article describes the hybrid data driven and rule based architecture HYDRA, th...

متن کامل

A system of stylized intonation contours in German

Modeling intonation, i.e., specifying adequate fundamental frequency (F0) contours, remains a challenging task for speech synthesis systems. This paper discusses the development of a system for phonetically specifying intonation contours for German. It deals with the problem of translating an abstract phonological representation of intonation namely the tone-sequence model into a concrete phone...

متن کامل

A quantitative description of German prosody offering symbolic labels as a by-product

The prosodic quality of a text-to-speech system is important for the intellegibility and perceived naturalness of synthetic speech. In earlier works the author developed a linguistically motivated model of German intonation based on the quantitative Fujisaki model of the production process of F0. The current paper compares results yielded by automatic Fujisaki modeling with a GToBI-style anotat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995